IEICE global.ieice.org Site

Keyword Search Result

[Keyword] reinforcement learning(72hit)

41-60hit(72hit)

Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation
Fengfei ZHAO Zheng QIN Zhuo SHAO

PAPER-Artificial Intelligence, Data Mining

Vol:
E97-D No:1
Page(s):
89-97
The traditional reinforcement learning (RL) methods can solve Markov Decision Processes (MDPs) online, but these learning methods cannot effectively use a priori knowledge to guide the learning process. The exploration of the optimal policy is time-consuming and does not employ the information about specific issues. To tackle the problem, this paper proposes heuristic function negotiation (HFN) as an online learning framework. The HFN framework extends MDPs and introduces heuristic functions. HFN changes the state-action dual layer structure of traditional RL to the triple layer structure, in which multiple heuristic functions can be set to meet the needs required to solve the problem. The HFN framework can use different algorithms to let the functions negotiate to determine the appropriate action, and adjust the impact of each function according to the rewards. The HFN framework introduces domain knowledge by setting heuristic functions and thus speeds up the problem solving of MDPs. Furthermore, user preferences can be reflected in the learning process, which improves the flexibility of RL. The experiments show that, by setting reasonable heuristic functions, the learning results of the HFN framework are more efficient than traditional RL. We also apply HFN to the air combat simulation of unmanned aerial vehicles (UAVs), which shows that different function settings lead to different combat behaviors.
An Improved Model of Ant Colony Optimization Using a Novel Pheromone Update Strategy
Pooia LALBAKHSH Bahram ZAERI Ali LALBAKHSH

PAPER-Fundamentals of Information Systems

Vol:
E96-D No:11
Page(s):
2309-2318
The paper introduces a novel pheromone update strategy to improve the functionality of ant colony optimization algorithms. This modification tries to extend the search area by an optimistic reinforcement strategy in which not only the most desirable sub-solution is reinforced in each step, but some of the other partial solutions with acceptable levels of optimality are also favored. therefore, it improves the desire for the other potential solutions to be selected by the following artificial ants towards a more exhaustive algorithm by increasing the overall exploration. The modifications can be adopted in all ant-based optimization algorithms; however, this paper focuses on two static problems of travelling salesman problem and classification rule mining. To work on these challenging problems we considered two ACO algorithms of ACS (Ant Colony System) and AntMiner 3.0 and modified their pheromone update strategy. As shown by simulation experiments, the novel pheromone update method can improve the behavior of both algorithms regarding almost all the performance evaluation metrics.
Multi-Channel Cooperative Spectrum Sensing in Cognitive Radio Networks
Ji-Hoon LEE Woo-Jin SONG

LETTER-Communication Theory and Signals

Vol:
E96-A No:9
Page(s):
1909-1913
Spectrum sensing is one of the main functions in cognitive radio networks. To improve the sensing performance and increase spectrum efficiency, a number of cooperative spectrum sensing methods have been proposed. However, most of these methods focused on a single-channel environment. In this letter, we present a novel cooperative spectrum sensing method based on cooperator selection in a multi-channel cognitive radio network. Using reinforcement learning, a cognitive radio user can select reliable and robust cooperators, without any a priori knowledge. Using the proposed method, a cognitive radio user can achieve better sensing capability and overcome performance degradation problems due to malicious users or erratic user behavior. Numerical results show that the proposed method can achieve excellent performance.
Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting
Ning XIE Hirotaka HACHIYA Masashi SUGIYAMA

PAPER-Artificial Intelligence, Data Mining

Vol:
E96-D No:5
Page(s):
1134-1144
Oriental ink painting, called Sumi-e, is one of the most distinctive painting styles and has attracted artists around the world. Major challenges in Sumi-e simulation are to abstract complex scene information and reproduce smooth and natural brush strokes. To automatically generate such strokes, we propose to model the brush as a reinforcement learning agent, and let the agent learn the desired brush-trajectories by maximizing the sum of rewards in the policy search framework. To achieve better performance, we provide elaborate design of actions, states, and rewards specifically tailored for a Sumi-e agent. The effectiveness of our proposed approach is demonstrated through experiments on Sumi-e simulation.
Reinforcement Learning of Optimal Supervisor for Discrete Event Systems with Different Preferences
Koji KAJIWARA Tatsushi YAMASAKI

PAPER-Concurrent Systems

Vol:
E96-A No:2
Page(s):
525-531
In this paper, we propose an optimal supervisory control method for discrete event systems (DESs) that have different preferences. In our previous work, we proposed an optimal supervisory control method based on reinforcement learning. In this paper, we extend it and consider a system that consists of several local systems. This system is modeled by a decentralized DES (DDES) that consists of local DESs, and is supervised by a central supervisor. In addition, we consider that the supervisor and each local DES have their own preferences. Each preference is represented by a preference function. We introduce the new value function based on the preference functions. Then, we propose the learning method of the optimal supervisor based on reinforcement learning for the DDESs. The supervisor learns how to assign the control pattern so as to maximize the value function for the DDES. The proposed method shows the general framework of optimal supervisory control for the DDES that consists of several local systems with different preferences. We show the efficiency of the proposed method through a computer simulation.
Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems
Jaak SIMM Masashi SUGIYAMA Hirotaka HACHIYA

PAPER-Artificial Intelligence, Data Mining

Vol:
E95-D No:10
Page(s):
2426-2437
Reinforcement learning (RL) is a flexible framework for learning a decision rule in an unknown environment. However, a large number of samples are often required for finding a useful decision rule. To mitigate this problem, the concept of transfer learning has been employed to utilize knowledge obtained from similar RL tasks. However, most approaches developed so far are useful only in low-dimensional settings. In this paper, we propose a novel transfer learning idea that targets problems with high-dimensional states. Our idea is to transfer knowledge between state factors (e.g., interacting objects) within a single RL task. This allows the agent to learn the system dynamics of the target RL task with fewer data samples. The effectiveness of the proposed method is demonstrated through experiments.
An Adaptive Method to Acquire QoS Class Allocation Policy Based on Reinforcement Learning
Nagao OGINO Hajime NAKAMURA

PAPER-Network

Vol:
E95-B No:9
Page(s):
2828-2837
For real-time services, such as VoIP and videoconferencing supplied through a multi-domain MPLS network, it is vital to guarantee end-to-end QoS of the inter-domain paths. Thus, it is important to allocate an appropriate QoS class to the inter-domain paths in each transit domain. Because each domain has its own policy for QoS class allocation, each domain must then allocate an appropriate QoS class adaptively based on the estimation of the QoS class allocation policies adopted in other domains. This paper proposes an adaptive method for acquiring a QoS class allocation policy through the use of reinforcement learning. This method learns the appropriate policy through experience in the actual QoS class allocation process. Thus, the method can adapt to a complex environment where the arrival of inter-domain path requests does not follow a simple Poisson process and where the various QoS class allocation policies are adopted in other domains. The proposed method updates the allocation policy whenever a QoS class is actually allocated to an inter-domain path. Moreover, some of the allocation policies often utilized in the real operational environment can be updated and refined more frequently. For these reasons, the proposed method is designed to adapt rapidly to variances in the surrounding environment. Simulation results verify that the proposed method can quickly adapt to variations in the arrival process of inter-domain path requests and the QoS class allocation policies in other domains.
Option-Based Monte Carlo Algorithm with Conditioned Updating to Learn Conflict-Free Task Allocation in Transport Applications
Alex VALDIVIELSO Toshiyuki MIYAMOTO

PAPER

Vol:
E94-A No:12
Page(s):
2810-2820
In automated transport applications, the design of a task allocation policy becomes a complex problem when there are several agents in the system and conflicts between them may arise, affecting the system's performance. In this situation, to achieve a globally optimal result would require the complete knowledge of the system's model, which is infeasible for real systems with huge state spaces and unknown state-transition probabilities. Reinforcement Learning (RL) methods have done well approximating optimal results in the processing of tasks, without requiring previous knowledge of the system's model. However, to our knowledge, there are not many RL methods focused on the task allocation problem in transportation systems, and even fewer directly used to allocate tasks, considering the risk of conflicts between agents. In this paper, we propose an option-based RL algorithm with conditioned updating to make agents learn a task allocation policy to complete tasks while preventing conflicts between them. We use a multicar elevator (MCE) system as test application. Simulation results show that with our algorithm, elevator cars in the same shaft effectively learn to respond to service calls without interfering with each other, under different passenger arrival rates, and system configurations.
An Adaptive Cooperative Spectrum Sensing Scheme Using Reinforcement Learning for Cognitive Radio Sensor Networks
Thuc KIEU-XUAN Insoo KOO

LETTER-Network

Vol:
E94-B No:5
Page(s):
1456-1459
This letter proposes a novel decision fusion algorithm for cooperative spectrum sensing in cognitive radio sensor networks where a reinforcement learning algorithm is utilized at the fusion center to estimate the sensing performance of local spectrum sensing nodes. The estimates are then used to determine the weights of local decisions for the final decision making process that is based on the Chair-Vashney optimal decision fusion rule. Simulation results show that the sensing accuracy of the proposed scheme is comparable to that of the Chair-Vashney optimal decision fusion based scheme even though it does not require any knowledge of prior probabilities and local sensing performance of spectrum sensing nodes.
Model-Based Reinforcement Learning in Multiagent Systems with Sequential Action Selection
Ali AKRAMIZADEH Ahmad AFSHAR Mohammad Bagher MENHAJ Samira JAFARI

PAPER-Fundamentals of Information Systems

Vol:
E94-D No:2
Page(s):
255-263
Model-based reinforcement learning uses the gathered information, during each experience, more efficiently than model-free reinforcement learning. This is especially interesting in multiagent systems, since a large number of experiences are necessary to achieve a good performance. In this paper, model-based reinforcement learning is developed for a group of self-interested agents with sequential action selection based on traditional prioritized sweeping. Every single situation of decision making in this learning process, called extensive Markov game, is modeled as n-person general-sum extensive form game with perfect information. A modified version of backward induction is proposed for action selection, which adjusts the tradeoff between selecting subgame perfect equilibrium points, as the optimal joint actions, and learning new joint actions. The algorithm is proved to be convergent and discussed based on the new results on the convergence of the traditional prioritized sweeping.
Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation
Masashi SUGIYAMA Hirotaka HACHIYA Hisashi KASHIMA Tetsuro MORIMURA

PAPER-Artificial Intelligence, Data Mining

Vol:
E93-D No:9
Page(s):
2555-2565
Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.
Reasoning on the Self-Organizing Incremental Associative Memory for Online Robot Path Planning
Aram KAWEWONG Yutaro HONDA Manabu TSUBOYAMA Osamu HASEGAWA

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E93-D No:3
Page(s):
569-582
Robot path-planning is one of the important issues in robotic navigation. This paper presents a novel robot path-planning approach based on the associative memory using Self-Organizing Incremental Neural Networks (SOINN). By the proposed method, an environment is first autonomously divided into a set of path-fragments by junctions. Each fragment is represented by a sequence of preliminarily generated common patterns (CPs). In an online manner, a robot regards the current path as the associative path-fragments, each connected by junctions. The reasoning technique is additionally proposed for decision making at each junction to speed up the exploration time. Distinct from other methods, our method does not ignore the important information about the regions between junctions (path-fragments). The resultant number of path-fragments is also less than other method. Evaluation is done via Webots physical 3D-simulated and real robot experiments, where only distance sensors are available. Results show that our method can represent the environment effectively; it enables the robot to solve the goal-oriented navigation problem in only one episode, which is actually less than that necessary for most of the Reinforcement Learning (RL) based methods. The running time is proved finite and scales well with the environment. The resultant number of path-fragments matches well to the environment.
Synchronization of Chaotic Systems without Direct Connections Using Reinforcement Learning
Norihisa SATO Masaharu ADACHI

PAPER

Vol:
E92-A No:4
Page(s):
958-965
In this paper, we propose a control method for the synchronization of chaotic systems that does not require the systems to be connected, unlike existing methods such as that proposed by Pecora and Carroll in 1990. The method is based on the reinforcement learning algorithm. We apply our method to two discrete-time chaotic systems with mismatched parameters and achieve M step delay synchronization. Moreover, we extend the proposed method to the synchronization of continuous-time chaotic systems.
A Nonlinear Approach to Robust Routing Based on Reinforcement Learning with State Space Compression and Adaptive Basis Construction
Hideki SATOH

PAPER-Nonlinear Problems

Vol:
E91-A No:7
Page(s):
1733-1740
A robust routing algorithm was developed based on reinforcement learning that uses (1) reward-weighted principal component analysis, which compresses the state space of a network with a large number of nodes and eliminates the adverse effects of various types of attacks or disturbance noises, (2) activity-oriented index allocation, which adaptively constructs a basis that is used for approximating routing probabilities, and (3) newly developed space compression based on a potential model that reduces the space for routing probabilities. This algorithm takes all the network states into account and reduces the adverse effects of disturbance noises. The algorithm thus works well, and the frequencies of causing routing loops and falling to a local optimum are reduced even if the routing information is disturbed.
Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation
Hideki SATOH

PAPER-Nonlinear Problems

Vol:
E91-A No:4
Page(s):
1169-1176
An orthonormal basis adaptation method for function approximation was developed and applied to reinforcement learning with multi-dimensional continuous state space. First, a basis used for linear function approximation of a control function is set to an orthonormal basis. Next, basis elements with small activities are replaced with other candidate elements as learning progresses. As this replacement is repeated, the number of basis elements with large activities increases. Example chaos control problems for multiple logistic maps were solved, demonstrating that the method for adapting an orthonormal basis can modify a basis while holding the orthonormality in accordance with changes in the environment to improve the performance of reinforcement learning and to eliminate the adverse effects of redundant noisy states.
A State Space Compression Method Based on Multivariate Analysis for Reinforcement Learning in High-Dimensional Continuous State Spaces
Hideki SATOH

PAPER-Nonlinear Problems

Vol:
E89-A No:8
Page(s):
2181-2191
A state space compression method based on multivariate analysis was developed and applied to reinforcement learning for high-dimensional continuous state spaces. First, useful components in the state variables of an environment are extracted and meaningless ones are removed by using multiple regression analysis. Next, the state space of the environment is compressed by using principal component analysis so that only a few principal components can express the dynamics of the environment. Then, a basis of a feature space for function approximation is constructed based on orthonormal bases of the important principal components. A feature space is thus autonomously construct without preliminary knowledge of the environment, and the environment is effectively expressed in the feature space. An example synchronization problem for multiple logistic maps was solved using this method, demonstrating that it solves the curse of dimensionality and exhibits high performance without suffering from disturbance states.
Exploiting Intelligence in Fighting Action Games Using Neural Networks
Byeong Heon CHO Sung Hoon JUNG Yeong Rak SEONG Ha Ryoung OH

PAPER-Biocybernetics, Neurocomputing

Vol:
E89-D No:3
Page(s):
1249-1256
This paper proposes novel methods to provide intelligence for characters in fighting action games by using neural networks. First, how a character learns basic game rules and matches against randomly acting opponents is considered. Since each action takes more than one time unit in general fighting action games, the results of a character's action are exposed not immediately but several time units later. We evaluate the fitness of a decision by using the relative score change caused by the decision. Whenever the scores of fighting characters are changed, the decision causing the score change is identified, and then the neural network is trained by using the score difference and the previous input and output values which induced the decision. Second, how to cope more properly with opponents that act with predefined action patterns is addressed. The opponents' past actions are utilized to find out the optimal counter-actions for the patterns. Lastly, a method in order to learn moving actions is proposed. To evaluate the performance of the proposed algorithm, we implement a simple fighting action game. Then the proposed intelligent character (IC) fights with the opponent characters (OCs) which act randomly or with predefined action patterns. The results show that the IC understands the game rules and finds out the optimal counter-actions for the opponents' action patterns by itself.
Decentralized Supervisory Control of Discrete Event Systems Based on Reinforcement Learning
Tatsushi YAMASAKI Toshimitsu USHIO

PAPER

Vol:
E88-A No:11
Page(s):
3045-3050
A supervisor proposed by Ramadge and Wonham controls a discrete event system (DES) so as to satisfy logical control specifications. However a precise description of both the specifications and the DES is needed for the control. This paper proposes a synthesis method of the supervisor for decentralized DESs based on reinforcement learning. In decentralized DESs, several local supervisors exist and control the DES jointly. Costs for disabling and occurrence of events as well as control specifications are considered. By using reinforcement learning, the proposed method is applicable under imprecise specifications and uncertain environment.
Analysis on the Parameters of the Evolving Artificial Agents in Sequential Bargaining Game
Seok-Cheol CHANG Joung-Il YUN Ju-Sang LEE Sang-Uk LEE Nitaigour-Premchand MAHALIK Byung-Ha AHN

LETTER

Vol:
E88-D No:9
Page(s):
2098-2101
Over the past few years, a considerable number of studies have been conducted on modeling the bargaining game using artificial agents on within-model interaction. However, very few attempts have been made at study on the interaction and co-evolutionary process among heterogeneous artificial agents. Therefore, we present two kinds of artificial agents, based on genetic algorithm (GA) and reinforcement learning (RL), which play a game on between-model interaction. We investigate their co-evolutionary processes and analyze their parameters using the analysis of variance.
CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes
Hiroshi OSADA Satoshi FUJITA

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E88-D No:5
Page(s):
1004-1011
In this paper, we propose a new reinforcement learning scheme called CHQ that could efficiently acquire appropriate policies under partially observable Markov decision processes (POMDP) involving probabilistic state transitions, that frequently occurs in multi-agent systems in which each agent independently takes a probabilistic action based on a partial observation of the underlying environment. A key idea of CHQ is to extend the HQ-learning proposed by Wiering et al. in such a way that it could learn the activation order of the MDP subtasks as well as an appropriate policy under each MDP subtask. The goodness of the proposed scheme is experimentally evaluated. The result of experiments implies that it can acquire a deterministic policy with a sufficiently high success rate, even if the given task is POMDP with probabilistic state transitions.

41-60hit(72hit)

Keyword Search Result

[Keyword] reinforcement learning(72hit)

Heuristic Function Negotiation for Markov Decision Process and Its Application in UAV Simulation

An Improved Model of Ant Colony Optimization Using a Novel Pheromone Update Strategy

Multi-Channel Cooperative Spectrum Sensing in Cognitive Radio Networks

Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting

Reinforcement Learning of Optimal Supervisor for Discrete Event Systems with Different Preferences

Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems

An Adaptive Method to Acquire QoS Class Allocation Policy Based on Reinforcement Learning

Option-Based Monte Carlo Algorithm with Conditioned Updating to Learn Conflict-Free Task Allocation in Transport Applications

An Adaptive Cooperative Spectrum Sensing Scheme Using Reinforcement Learning for Cognitive Radio Sensor Networks

Model-Based Reinforcement Learning in Multiagent Systems with Sequential Action Selection

Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation

Reasoning on the Self-Organizing Incremental Associative Memory for Online Robot Path Planning

Synchronization of Chaotic Systems without Direct Connections Using Reinforcement Learning

A Nonlinear Approach to Robust Routing Based on Reinforcement Learning with State Space Compression and Adaptive Basis Construction

Reinforcement Learning with Orthonormal Basis Adaptation Based on Activity-Oriented Index Allocation

A State Space Compression Method Based on Multivariate Analysis for Reinforcement Learning in High-Dimensional Continuous State Spaces

Exploiting Intelligence in Fighting Action Games Using Neural Networks

Decentralized Supervisory Control of Discrete Event Systems Based on Reinforcement Learning

Analysis on the Parameters of the Evolving Artificial Agents in Sequential Bargaining Game

CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles